A Model-Independent Measure of Regression Difficulty
نویسندگان
چکیده
data mining, machine learning, model fitting, regression, exploratory data analysis, error rate estimation, data modeling, data cleaning, data preparation, predictability We prove an inequality bound for the variance of the error of a regression function plus its non-smoothness as quantified by the Uniform Lipschitz condition. The coefficients in the inequality are calculated based on training data with no assumptions about how the regression function is learned. This inequality, called the Unpredictability Inequality, allows us to evaluate the difficulty of the regression problem for a given dataset, before applying any regression method. The Inequality gives information on the tradeoff between prediction error and how sensitive predictions must be to predictor values. The Unpredictability Inequality can be applied to any convex subregion of the space X of predictors. We improve the effectiveness of the Inequality by partitioning X into multiple convex subregions via clustering, and then applying the Inequality on each subregion. Experimental results on genuine data from a manufacturing line show that, combined with clustering, the Unpredictability Inequality provides considerable insight and help in selecting a regression method.
منابع مشابه
The Role of Difficulty in Emotion Regulation and Impulsivity (Five Factor Model) in Predicting Problematic Mobile Use in Adolescents
Problematic use of mobile phones in adolescents is very high and increasing. Therefore, the present research aimed to investigate the role of difficulty in emotion regulation and impulsivity (five-factor model) in predicting problematic mobile use in adolescents. This study was descriptive form type of correlation. The research population was the secondary high school students of Urmia city in ...
متن کاملMaintainability measure based on operating environment, a case study: Sungun copper mine
The life cycle cost of a system is influenced by its maintainability. Maintainability is a design parameter, whose operational conditions can affect it significantly. Hence, the effects of these operational conditions should be quantified early in the design phase. The proportional repair model (PRM), which is developed based on the proportional hazard model (PHM), can be used to analyze mainta...
متن کاملPredicting Type Two Diabetes and Determination of Effectiveness of Risk Factors Applying Logistic Regression Model
Background & Aim: Diabetes is one of the chronic diseases with no curative treatment; also, it is the most common cause of amputation, blindness and chronic renal failure and the most important risk factor of heart diseases. Logistic regression is one of the statistical analysis models for predicting that can be used to find out the relationship between dependent and predictor independent varia...
متن کاملMapping of forage Production in Poor Rangelands Haftkel Rangelands Using Sentile-2 Images
Background and objectives: Determining the exact amount of forage production can be of great help to rangeland managers and relevant specialists in determining proper stocking rate. With implementing proper sampling design, remote sensing data could be used to accurately estimate forage production due to the extent of rangelands areas, cost, time spent and other problems in data gathering from ...
متن کاملAn Analysis of Statistical Models and Features for Reading Difficulty Prediction
A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntactic parses. We also tested statistical models for nominal, ordinal, and interval scales of measure...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000